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Dear  Sirs, 


This  letter  report  for  the  period  of  1  April  1991  through  1  January  1992  constitutes  the 
following  sections:  new  hires,  research  progress,  and  equipment  expenses. 


1.  New  hires 


Two  graduate  students  have  been  supported  under  this  award  since  May  1991.  Amiya 
Bhattacharya,  a  graduate  student  in  the  Computer  Science  and  Engineering  department, 
joined  the  project  in  the  Spring  quarter  1991  after  obtaining  Ph.D  status.  With  a  solid 
background  in  computer  science  and  fault-tolerant  computing,  Amiya  has  been  focusing 
on  fault-tolerant  optical  interconnection  architectures  and  the  search  of  a  proper 
performance  measure.  John  Comito,  a  graduate  student  in  the  Electrical  and  Computer 
Engineering  department,  was  recruited  after  successfully  obtaining  Ph.D  status  in  July 
1991.  With  extensive  background  in  engineering  physics  with  emphasis  in  optics,  John  is 
the  perfect  candidate  for  fault  modeling  project  for  opto-electronic  systems. 

2.  Research  progress 

Please  see  the  two  reports  in  appendix  I  and  II. 


3.  Equipment  expenses 


Several  purchases  were  made  to  facilitate  the  development  of  project.  These  include 
a  NeXT  printer,  a  68040  upgrade  board,  and  some  relevant  books  on  optical  computing. 
Since  the  NeXT  is  used  to  support  student  research  in  simulation  and  evaluation,  there  is 
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a  need  for  another  computer.  We  are  now  assessing  our  needs  and  the  availability  in  the 
hardware/software  market. 

The  above  three  sections  detailed  both  technical  as  well  as  budgetary  issues.  If  there 
are  issues  not  described  or  not  clear,  please  do  not  hesitate  to  call  me.  Thank  you. 


Ting-Ting  Lin^ 
(619)  534-4738 
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APPENDIX  I 


Interim  Report  on 

Fault-Tolerant  Optical  Interconnection  and  Performance  Metrics 
Motivation 

With  the  rapid  growth  in  device  density  achieved  by  VLSI  technology,  design  of  array 
processors  had  the  attention  of  researchers  over  the  past  decade.  Starting  from  the 
introduction  of  systolic  arrays,  formal  mapping  strategy  of  algorithms  onto  arrays  has 
developed  [1].  Use  of  spare  or  idle  processing  elements  to  achieve  fault  tolerance  became 
an  important  aspect  of  VLSI  array  processing.  Algorithms  have  been  proposed  for 
concurrent  error  detection  and  system  reconfiguration  during  processor  malfunction  to 
either  accomplish  designated  tasks  or  accommodate  graceful  degradation  [2,3],  however, 
formal  mapping  technique  has  not  been  extended  to  capture  the  issues  of  redundancy. 
Furthermore,  processor  interconnection  topologies  like  shuffle-exchange  or  butterfly  that 
require  global  communication  links,  were  not  considered  to  be  practical  for  large  VLSI 
design.  The  overhead  of  routing  the  global  links  on  chip,to  incorporate  an  effective 
amount  of  redundancy,  is  prohibitively  high  in  terms  of  chip  area,  signal  propagation 
delay  and  power  dissipation.The  proposed  free-space  optical  interconnections  have  thus 
offered  the  possibility  of  having  those  global  links  established  with  comparatively  less 
overhead,  making  it  desirable  to  investigate  the  fault-tolerance  capabilities  of  these 
networks. 

Two  problems  in  this  area  are  worth  further  investigations.  First,  an  interconnection 
topology  that  supports  fault-tolerance  is  introduced.  At  the  moment,  there  is  no  means  to 
arrive  at  the  most  suited  topological  design  for  performing  a  given  task  with  certain 
amount  of  redundancy  allowable  in  a  technological  framework,  e.g.  optical 
implementation.  Second  is  the  absence  of  a  combined  performance  and  reliability  metric. 
Traditionally,  latency  and  throughput  are  used  to  compare  performance  in  absence  of  any 
kind  of  redundancy.  Performability,  i.e.  the  probability  that  the  system  performs  above 
some  performance  level  specified  as  a  parameter,  is  one  which  combines  the  effect  of 
performance  and  reliability  [4].  Since  this  has  not  been  defined  directly  in  terms  of 
redundancy,  it  cannot  be  used  as  a  guiding  factor  for  the  topological  analysis  of  a  fault- 
tolerant  network  with  redundant  design.  To  treat  the  problem  in  a  unified  manner,  it  is 
necessary  to  introduce  a  formal  redundancy  mapping  methodology  that  would  help 
extract  the  performance  and  reliability  metrics  of  the  fault-tolerant  network.  The  goal  of 
the  proposed  research  is  to  find  representation  for  redundancy,  and  a  mapping  strategy 
guided  by  a  new  performance  criterion  for  designing  processor  interconnections  within 
the  scope  of  a  technology. 

Background  and  Details 

Traditionally  redundancy  is  classified  into  three  types;  hardware  redundancy,  time 
redundancy  and  information  redundancy.  However,  the  first  two  kinds  can  be  viewed  as 
mapping  of  the  more  basic  information  redundancy  onto  space  or  time.  In  particular, 
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whenever  a  function  is  computed  more  than  once,  either  at  different  site  or  time,  the  node 
representing  that  computation  in  the  data  dependence  graph  (DG)  can  be  replicated,  thus 
producing  a  graph  called  redundant  data-dependence  graph  (RDG).  For  example,  extra 
bits  sent  for  error  detection  or  correction  may  be  represented  by  additional  edges  between 
two  nodes  of  a  bit-level  RDG.  The  computation  represented  by  the  nodes  and  the  data 
flow  through  the  edges  thus  can  be  chosen  appropriately.  Effectively,  information 
redundancy  remains  the  abstraction  which  expresses  itself  in  the  physical  form  of  either 
hardware  or  time  redundancy,  or  as  a  combination  of  both. 

Having  defined  the  RDG  in  this  manner,  it  remains  to  examine  the  directional 
classification  of  RDG.  For  systolic  algorithms,  DGs  are  shift-invariant,  i.e.  the 
dependence  arcs  do  not  change  with  respect  to  node  positions.  Canonical  mapping  could 
be  applied  to  arrive  at  an  SFG  signal-flow  graph  from  DG.  But,  that  doesn’t  ensure  that 
RDG  will  also  be  shift-invariant.  They  may  be  directional  shift-invariant  (DSI)  or  pseudo 
directional  shift-invariant  (PDSI).  It  is  known  that  if  the  graph  that  is  at  least  pseudo-DSI, 
will  map  to  a  structurally  time-invariant  (STI)  graph  [1].  The  goal  is  to  investigate  the 
restrictions  that  apply  in  different  fault-tolerant  arrays  and  networks,  so  that  the  features 
of  RDG  can  be  analysed  better  for  mapping. 

In  order  to  find  a  performance  measure  by  which  design  alternatives  can  be  compared, 
Huang-Abraham  Ratio  has  been  used  in  the  context  of  systolic  array.  It  is  defined  as  [2] 
PBT2 

R  = - 

Cl 

where  P  =  No.  of  PEs 

B  =  Input  bandwidth 
T  =  Latency 

C  =  Gross  volume  of  computation 
I  =  Input  volume 

The  ratio  can  be  represented  as 

BT  PT 

R  = . 

I  C 

where  the  reciprocal  of  the  first  factor  gives  the  efficiency  of  I/O  capability  used  (i.e. 
throughput  measure),  and  the  reciprocal  of  the  second  one  is  efficiency  of  computational 
capability  used  (i.e.  redundancy  measure).  None  of  these  metrics  are  sacrosanct  -  they 
should  be  changed  to  suit  the  redundancy  formulation  used  in  the  original  RDG. 
Therefore,  the  goal  is  to  define  a  generic  performance-redundancy  metric  in  terms  of  the 
RDG  that  will  present  the  same  simplicity  as  the  Huang-Abraham  Ratio  in  arrays. 
Moreover,  it  is  desirable  that  a  theoretical  relationship  be  established  between  this 
performance-redundancy  metric  and  performability  as  obtained  by  probabilistic  analysis. 
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APPENDIX  II 


Fault  Modeling  of  Opto-Electronic  Systems 

Introduction 

Computers  and  other  digital  systems  are  subject  to  many  different  types  of  faults.  These 
faults  are  the  result  of  design  flaws,  manufacturing  defects,  environmental  effects,  and 
normal  aging  effects.  It  is  useful  to  develop  models  for  these  faults,  which  adequately 
and  systematically  represent  the  fault.  Once  these  models  are  developed,  they  can  be 
used  in  fault  simulations  and  test  generation.  A  number  of  models  exist  for  use  in  digital 
systems,  these  include  the  single  stuck  line,  multiple  stuck  line,  generalized  single  stuck 
line,  general  functional,  and  coupling  faults.  The  most  widely  used  is  the  single  stuck 
line  model,  because  of  its  simplicity  and  effectiveness.  Simplicity  and  functionality  are 
the  key  characteristic  of  a  good  fault  model. 

The  performance  of  current  integrated  circuits  are  reaching  a  limit  in  the  number  I/O 
lines,  the  number  of  interconnections,  and  the  complexity  of  interconnections.  These 
limitations  can  be  overcome  through  the  introduction  of  optical  devices  to  the  system[2]. 
By  adding  optics  to  the  system,  we  add  another  dimension  and  unfortunately  another  set 
of  faults  to  the  system.  This  new  set  of  faults  brings  the  need  for  a  new  set  of  fault 
models  to  allow  fault  simulation  and  fault  testing  to  be  performed  on  the  new  optical 
computing  systems.  The  study  of  these  new  optical  computing  systems  for  the 
development  of  fault  models  is  the  subject  of  this  paper. 

Due  to  the  infancy  of  the  optical  computing  field,  there  are  only  a  few  working  systems  in 
existence.  These  systems  are  just  a  handful  of  the  proposed  system  designs.  Different 
technologies  are  used  in  the  proposed  systems  with  a  varying  degree  of  hybridization 
between  the  optical  and  the  electrical  components.  This  degree  of  hybridization  can 
affect  the  final  fault  models.  Also,  the  manner  in  which  a  system  is  integrated  and  the 
technology  which  is  used  in  this  integration  can  make  a  difference  in  determining  what 
types  of  faults  may  arise. 

To  obtain  a  set  of  fault  models,  certain  assumptions  and  simplifications  must  be  made. 
Also,  we  must  use  a  level  of  abstraction  which  allows  the  fault  models  to  adequately 
represent  a  large  number  of  the  proposed  systems.  What  has  been  accomplished  in  the 
past  three  months  is  a  survey  of  various  proposed  optical  computing  systems  with 
different  components  utilized  and  connected.  A  preliminary  set  of  component  groups 
was  then  determined  from  the  survey.  The  groups  are  as  follows:  optical  sources,  optical 
detectors,  optical  interconnections,  optical  modulators,  optical  logic,  and  optical  memory. 
Each  of  these  groups  could  have  devices  implemented  with  different  technologies  and 
have  varying  physical  structure.  For  example  an  optical  interconnection  may  take  many 
different  physical  forms  (such  as,  various  waveguide  structures,  holographic  optical 
elements,  etc.),  but  still  have  the  same  functionality.  The  component  groups  were  chosen 
in  the  hope  that  a  general  fault  model  could  be  developed  to  include  all  the  devices  in 
each  group. 
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Before  fault  models  can  be  developed  a  knowledge  of  what  instances  of  physical  faults 
occur  with  any  given  device  must  be  determined.  Once  this  is  done  the  physical  faults 
can  be  abstracted  to  the  logical  level,  i.e.,  how  does  the  physical  faults  manifest  itself  at 
the  logic  level.  We  show  in  the  following  a  summary  of  a  partial  survey  of  the  types  of 
physical  faults  which  occur  within  each  of  the  component  groups  and  their  effects  on  an 
optical  computing  system  at  the  logic  level. 

A)  Optical  Sources 

Many  of  the  physical  faults  that  occur  with  a  laser  diode  and  its  bias  circuitry  can  be 
modeled  as  a  simple  logic  line  stuck-at  fault.  Through  aging,  manufacturing  defects,  and 
manufacturing  variances  the  bias  current  needed  to  drive  the  laser  diode  to  a  required 
optical  power  output  will  vary [3].  A  fault  in  the  bias  circuitry  can  cause  a  variation  in  the 
optical  power  output  from  the  necessary  level.  When  the  output  of  the  laser  is  to  be 
"fanned-out",  a  reduction  in  the  optical  power  could  cause  several  of  the  "fanned-out" 
signals  to  have  an  insufficient  maximum  intensity  which  could  not  be  interpreted  as  a 
logic  one  signal.  Thus  we  can  consider  this  group  of  signals  as  being  stuck  at  logic  zero. 
In  the  worst  case,  the  current  through  the  laser  diode  would  be  below  the  threshold  value. 
This  type  of  fault  could  be  viewed  as  a  logic  stuck-at-zero  fault.  On  the  contrary,  an 
increase  in  optical  power  output  could  also  cause  a  fault  to  occur.  When  the  output  is 
not  this  extreme,  the  increase  could  be  interpreted  as  a  stuck-at-one  fault.  Assuming  that 
the  output  is  being  received  by  a  modulator  with  insufficient  contrast  ratio  to  reduce  the 
optical  output  below  the  optical  logic  level  low.  Another  possible  outcome  of  this 
increase  could  be  a  catastrophic  failure,  where  the  power  output  is  so  high  that  it  would 
physically  damage  the  optical  receivers  or  the  laser  itself. 

Furthermore,  longitudinal  mode  "hopping"  in  lasers  due  to  temperature  variations  is 
also  another  source  of  faults.  Longitudinal  mode  "hopping"  results  in  a  change  in  the 
lasing  frequency  of  the  laser  which  would  cause  problems  with  interconnections  and 
beam  detection.  The  full  implications  of  the  effects  of  mode  hopping  in  optical 
computing  systems  is  currently  under  investigation. 

B)  Optical  Detectors 

The  detectors  are  the  simplest  devices  of  the  component  groups.  Most  of  the  physical 
faults  can  be  generalized  to  fit  the  logic  stuck-at  line  models.  Faults  can  occur  in  either 
the  detector  itself  or  the  biasing  circuit  for  the  detector.  These  faults  can  be  interpreted  as 
either  a  logic  stuck-at-one  or  a  logic  stuck-at-zero  faults  depending  on  the  physical  fault. 

C)  Optical  Interconnections 

Optical  interconnection  exhibits  the  widest  variety  in  physical  realization  than  any  of 
the  other  component  groups  [4].  A  large  number  of  different  interconnection  schemes 
have  been  proposed,  but  nearly  all  of  the  schemes  fall  into  one  of  the  two  basic 
categories:  waveguides  and  free-space  interconnections.  The  waveguides  would  have 
faults  related  to  power  losses.  These  losses  could  arise  from  insufficient  waveguide 
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coupling  or  defects  in  the  waveguide  causing  an  increase  in  power  loss  per  distance. 
Losses  of  this  type  could  produce  a  logic  stuck-at-zero  fault,  if  the  losses  were  sufficient. 

Developing  fault  models  for  free  space  interconnections  is  a  problem  which  requires 
some  new  thinking.  A  free  space  interconnection  uses  a  holographic  optical  element  to 
perform  beam  splitting,  beam  deflecting,  focusing,  beam  conditioning,  or  any 
combination  of  the  preceding  functions.  Aging,  misalignment,  and  manufacturing  defects 
are  a  few  factors  which  could  cause  a  fault  in  a  system  utilizing  a  holographic  optical 
element  (HOE).  The  most  popular  of  these  HOEs  is  a  computer  generated  hologram 
(CGH),  because  of  its  compatibility  with  VLSI  techniques.  This  compatibility  leads  to  a 
high  degree  of  integration.  The  effects  of  the  faults  with  such  elements  is  cumently  under 
examination,  with  CGHs  being  the  main  focus. 

D)  Optical  Modulators 

Light  modulator  is  another  group  which  can  be  divided  into  two  subgroups:  amplitude 
modulators  and  deflection  modulators.  The  amplitude  modulators  exhibit  physical  faults 
such  as  a  low  contrast  ratio,  stuck  in  transmission  mode,  and  stuck  in  blocking  mode. 
These  faults  can  be  described  at  a  logic  level  with  the  stuck  at  fault  model.  As  for  the 
deflection  mode  modulators,  a  fault  would  be  the  inability  to  deflect  the  light  properly. 
This  type  of  fault  could  range  from  a  non-functioning  modulator  to  one  whose  deflection 
angle  is  slightly  off  from  the  desired  angle.  This  is  similar  to  modeling  a  fault  with  a 
free-space  interconnection.  This  model  is  currently  under  investigation. 

E)  Optical  Logic  Gates  and  Optical  Modulators 

Optical  logic  gates  and  optical  memories  are  two  groups  of  components  which  are 
still  in  the  early  stages  of  development.  Many  optical  logic  gate  designs  utilize  optical 
sources  and  detectors  as  their  main  components,  this  makes  it  possible  to  use  models 
developed  earlier  for  sources  and  detectors.  However,  devices  like  the  Self  Electro-optic 
Effect  Device  (SEED)[5]  are  a  totally  new  device  and  may  possibly  need  models 
developed  specially  for  them.  Optical  computing  systems  utilizing  SEEDs  are  still  in  a 
stage  of  development  and  it  will  be  some  time  before  fault  models  can  be  developed. 
When  more  systems  utilizing  the  SEED  become  available,  models  for  the  SEEDs  will  be 
developed.  Optical  memories  are  also  in  the  research  and  development  state.  There  are 
some  memory  schemes  that  look  promising,  but  it  will  be  a  while  before  proper  fault 
models  can  be  derived. 

In  our  surveys  of  optical  computing  components  we  have  tried  to  keep  an 
encompassing  theme  which  will  accommodate  many  different  systems.  At  the  same 
time,  we  do  not  want  to  trade  off  too  much  functionality  for  generality.  Also,  it  would  be 
useful  to  use  existing  models,  so  that  existing  simulation  and  test  generation  tools  can  be 
used.  So  far  it  seems  that  some  of  the  existing  models  can  be  used  and  with  further 
investigation  we  think  that  this  will  prove  to  be  the  case  for  more  of  the  components.  In 
the  future,  we  would  like  to  further  develop  the  fault  models  mentioned  in  this  paper  and 
continue  development  of  fault  models  for  all  the  component  groups.  Once  this  is 
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achieved,  methods  for  utilizing  the  fault  models  will  be  developed  and  demonstrated. 
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